- Complex data structures (matrices, lists and data frames)
- Functions in R
- Reading data from files
- vectors (string, number, integer, logic, factor)
- matrices and arrays
- lists
- data frames
2026-01-20
Questions, anybody?
Elements of a vector can be accessed not only using numbers (indices) or logical vectors. You can assign names to vectors:
person <- c("January", "Weiner", 134)
names(person) <- c("FirstName", "LastName", "Age")
person["FirstName"]
person["Age"]
Vectors always have only one type of data. If you mix strings and numbers, R will convert all elements to strings.
person <- c(10, 20, "30") person * 10
samples <- c(1, 10, 23, 42, 13) samples_n <- length(samples)
Both c and length are functions. They take some arguments (often many of them) and return a single object: a vector, a matrix or something else.
You can always assign the result of the function to a variable.
Sometimes functions return NULL, which is R for “nothing”, but which is still something you can use or assign (yet another special value! we will meet it again later).
In R, there are two special values: TRUE and FALSE. They can be used to create logical vectors.
sel <- c(TRUE, TRUE, TRUE, TRUE, FALSE) sel !sel
Comparison operators (>, <, <=, >=, ==, !=) produce logical vectors:
samples <- c(1, 1, 2, 5, 7) samples > 2 which(samples == 7) which(samples != 1)
Logical vectors can be used to access elements:
persons <- c("Aphrodite", "Bacchus", "Circe", "Demeter", "Eurypides")
sel <- c(TRUE, TRUE, TRUE, TRUE, FALSE)
persons[sel]
# we can abbreviate the TRUE and FALSE to T and F (but try to avoid it!)
greek <- persons[ c(T, F, T, T, T) ]
The reason to avoid using the T and F is that they can be overwritten by the user, and then you will have a hard time debugging your code.
T <- FALSE persons[ c(T, F, T, T, T) ]
https://youtu.be/xmeZofFlp78 (8.4 minutes)Create a vector as follows:
samples <- c(1, 10, NA, 15)
NA stands for not available (e.g., missing data)
length(samples) return?mean(samples) return? Why is that?na.rm=TRUE for the mean() function. Look up help (?mean) to see how it can be used. What happens now?is.na() function return when applied to samples?NA values? Try is.na and whichNA?The reason we are showing how to create a function is to show you that it is simple, and also because it will help you understand what functions are.
#' Function name
#' Function description
some_name <- function(param1, param2=2) {
## code comment
# <your code goes in here>
}
rep and paste/paste0 functionsrep is used to replicate vectors. It is a very useful shorthand when you need to generate e.g. experimental conditions.
rep(c("A", "B"), 5)
rep(c("A", "B"), each=5)
rep and paste/paste0 functionspaste and paste0 are used to concatenate strings. paste adds a space between the strings, paste0 does not.
paste("A", "B", "C")
paste0("A", "B", "C")
a <- c("A", "B", "C")
b <- c("1", "2", "3")
paste0(a, b)
Together, these functions are really useful to generate, for example, labels, experimental conditions, names of files etc.
Exercise: using rep and paste, create a vector like this: A1, A2, A3, B1, B2, B3, C1, C2, C3
Much like vectors, matrices can only hold one data type (e.g. only numeric or only character or only logical etc.).
m <- matrix(1:18, ncol=3, nrow=6) # compare with m <- matrix(1:18, ncol=3, nrow=6, byrow=TRUE) dim(m) ncol(m) nrow(m)
matrix[row, column]
So, for example:
m[1, ] # vector which is the first row m[, 2] # vector which is the first column m[3, 1] # first element of the third row
Note: it is also possible to have arrays with more than 2 dimensions (but you will probably not need them).
We can name rows and columns of a matrix and use the names to access the rows and columns:
colnames(m) <- letters[1:ncol(m)] rownames(m) <- LETTERS[1:nrow(m)] m["A", "b"] # one "cell" m["B", ] # one row m[ , "b"] # one column
letters and LETTERS are built-in vectors with the (English) alphabet. Useful for quick labeling stuff.
Assume you have a 48 well-plate for a drug sensitivity analysis with viability scores.
matrix and runif. These reflect your viability scores.Before starting you experiment, you decided to leave out the border wells to avoid edge effects:
The rows are treated with inhibitor 1 with increasing concentrations (control, low, medium, high). Columns 2 to 4 are treated with inhibitor 2 with increasing concentrations (control, low, high) and column 5 to 7 are treated with inhibitor 3 (same concentrations as inhibitor 2).
list() functionperson <- list(name="Weiner",
Age=NA,
given="January")
To access an element of a list, you need to use double brackets [[
person[["name"]]
There is a shortcut:
person$name
You can add elements to a list using the $ operator:
person$city <- c("Berlin", "Hoppegarten")
You can remove elements by assigning the NULL value to them:
person$city <- NULL
If you use single brackets [, you will get a piece of the “clothesline”, that is, you will produce a smaller list.
person["name"] class(person)
Caveats:
[[, not [names()), but don’t have toData frames are a bit like matrices, but every column can store different type of data. In this, they are more like lists (which they in fact are).
names <- c("January", "Manuela", "Bill")
lastn <- c("Weiner", "Benary", "Gates")
age <- c(1001, NA, 65)
d <- data.frame(names=names, last_names=lastn, age=age)
class(d)
class(d[,1])
class(d[,3])
You can access the data frame elements much like the elements of a matrix.
However, since data frames are lists, the list operator ($) also works:
d$names # same as d[,1] or d[, "names"] d$lastn d$lastn[1]
However, note that when you select a row, you will get a data frame, not a vector. This is because each of the column can be of different type, and vectors can hold only one type of data.
You can add new columns to a data frame using the $ operator:
d$city <- c("Hoppegarten", "Berlin", "Seattle")
You can remove columns by assigning the NULL value to them:
d$city <- NULL
Caveats:
stringsAsFactors=FALSEGory details: matrices are a basic data type. Data frames are a list.
Caveats:
tibbles are the data frames from tidyverse
Whatever you can do to a data frame, you can do to a tibble as well
read_* functions return a tibble
tibble do not have row names
If you select a single row in a data frame, you get a smaller data frame. If you select a single column, you get a vector.
In tibble, you always get a smaller tibble.
https://youtu.be/eWu7kvNBpyc (9.2 minutes)
matrix and rnorm.as.data.frame for that.rep function for that.seq function for that.[ ] … accessing the element of a vector / matrix / list / data frame -> extraction operators
[[ ]] … accessing the element/items of a list
$ … accessing elements by name
( ) … used when calling a function to provide arguments
{} … indicating a block, eg when defining a function